Analyze jobs with the highest vacancy rate

Variables with the prefixes ARBLONN_ARB_ and ARBLONN_LONN_ contain information related to all employment relationships registered through the A scheme. These data have job/work relationship as unit level, and not person. And individuals can in principle have more than one job at any given time. In other words, there will be more observations than the number of individuals at any given time in the dataset.

When you want to create statistics or analysis of jobs on individual level, you are often interested in information relating to a selected type of working relationship per individual, e.g. the main employment relationship, the job with the highest vacancy rate, the job with the highest agreed working hours or the job with the highest monthly salary.

The example below shows how to proceed to analyze jobs with the highest vacancy rate per individual.

 require no.ssb.fdb:31 as db

//Create a job dataset of active jobs per 16/7 2023, and find the job with the highest position percentage per individual
create-dataset job_data_max
import db/ARBLONN_ARB_YRKE_STYRK08 2023-07-16 as occupation
import db/ARBLONN_ARB_STILLINGSPST 2023-07-16 as position_pct
import db/ARBLONN_ARB_HOVEDARBEID 2023-07-16 as main_job
import db/ARBLONN_ARB_ANSETTELSESFORM 2023-07-16 as employment_form
import db/ARBLONN_ARB_ARBEIDSTID 2023-07-16 as working_hours
import db/ARBEIDSFORHOLD_PERSON as personid

textblock
Position percentage for all active jobs as of 16/7 2023 in the job dataset:
endblock
summarize position_pct
tabulate main_job
tabulate main_job, summarize(position_pct)

//Make a copy of the job dataset before it is aggregated
clone-dataset job_data_max job_data

//Aggregate the job dataset to the individual level, with information about the highest position percentage per individual
collapse(max) position_pct -> max_position_pct, by(personid)
textblock
Position percentage for jobs with the highest position percentage per individual:
endblock
summarize max_position_pct

//Link information about the highest position percentage to the complete job dataset
merge max_position_pct into job_data on personid

//Use the information to remove jobs in the job dataset that do not have the highest position percentage
use job_data
keep if position_pct == max_position_pct
textblock
Position percentage for jobs with the highest position percentage per individual.

Note that the number of jobs increases when selecting jobs with the highest position percentage in the job dataset. This is due to the occurrence of duplicates since it is possible to have e.g. two 100% positions (or more). But the extent of these cases is not large:
endblock
summarize position_pct
histogram position_pct, bin(5) percent

textblock
Position percentage and agreed working hours for jobs with the highest position percentage, divided by the form of employment:
endblock
tabulate employment_form, missing
tabulate employment_form, summarize(position_pct, working_hours)

//Aggregate job data up to individual level, and link personal data to create personal statistics
collapse(mean) position_pct working_hours, by(personid)

create-dataset persons
import db/BEFOLKNING_KJOENN as gender
merge gender into job_data

use job_data
textblock
Position percentage and agreed working time for jobs with the highest position percentage, divided by gender:
endblock
tabulate gender, missing
tabulate gender, summarize(position_pct, working_hours)